An MCP (Model Context Protocol) tool for company whitelist/blacklist lookup with fuzzy matching. Designed for the Praktikantenamt AI-Assistant to validate company names for internship contracts.
cd mcp-tools/company-lookup
pip install -e .
python create_sample_data.py
This creates data/companies.xlsx with sample whitelist and blacklist companies.
# Look up a company
company-lookup lookup -e data/companies.xlsx -q "Siemens AG"
# Look up with fuzzy matching
company-lookup lookup -e data/companies.xlsx -q "Seimens" -t 70
# List all companies
company-lookup list -e data/companies.xlsx
# List only whitelisted companies
company-lookup list -e data/companies.xlsx --status whitelist
# Show statistics
company-lookup stats -e data/companies.xlsx
# Batch lookup from file
company-lookup batch -e data/companies.xlsx -i company_names.txt -f csv
# Create template Excel file
company-lookup create-template -o my_companies.xlsx
# Start REST API server
company-lookup serve -e data/companies.xlsx -p 8000
# Build the image
docker-compose build
# Start MCP server with SSE (default)
docker-compose up company-mcp
# Start REST API server
docker-compose --profile api up
# Start MCP server with stdio
docker-compose --profile stdio up
| Service | Port | Description |
|---|---|---|
company-mcp |
8080 | MCP SSE transport (default) |
company-api |
8000 | FastAPI REST API |
company-mcp-stdio |
- | MCP stdio transport |
./data:/app/data:ro - Company Excel file (read-only)./results:/app/results - Output directory| Method | Endpoint | Description |
|---|---|---|
| POST | /lookup |
Look up a company |
| POST | /lookup/batch |
Batch lookup multiple companies |
| GET | /companies |
List all companies |
| GET | /companies/whitelist |
List whitelisted companies |
| GET | /companies/blacklist |
List blacklisted companies |
| GET | /stats |
Get database statistics |
| POST | /upload |
Upload Excel file |
| GET | /health |
Health check |
curl -X POST http://localhost:8000/lookup \
-H "Content-Type: application/json" \
-d '{"company_name": "Siemens", "threshold": 80}'
Response:
{
"query": "Siemens",
"status": "whitelisted",
"confidence": 0.95,
"is_approved": true,
"is_blocked": false,
"best_match": {
"company_name": "Siemens AG",
"similarity_score": 95.2,
"status": "whitelisted",
"is_exact_match": false
}
}
When used as an MCP server, the following tools are available:
lookup_companyLook up a company in the whitelist/blacklist database.
Parameters:
company_name (required): The name of the company to look upthreshold (optional, default: 80): Minimum similarity score (0-100)max_results (optional, default: 5): Maximum matches to returncheck_company_approvedQuick check if a company is approved (whitelisted).
Parameters:
company_name (required): The name of the companythreshold (optional, default: 80): Minimum similarity scorecheck_company_blockedQuick check if a company is blocked (blacklisted).
Parameters:
company_name (required): The name of the companythreshold (optional, default: 80): Minimum similarity scorelist_companiesList companies in the database.
Parameters:
status (optional, default: "all"): Filter - "all", "whitelist", or "blacklist"get_company_statsGet statistics about the company database.
batch_lookupLook up multiple companies at once.
Parameters:
company_names (required): List of company namesthreshold (optional, default: 80): Minimum similarity scoreAdd to ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"company-lookup": {
"command": "company-lookup-mcp",
"args": ["--transport", "stdio"],
"env": {
"COMPANY_LOOKUP_EXCEL_FILE": "/path/to/companies.xlsx"
}
}
}
}
Start the Docker container:
docker-compose up company-mcp
Add to Claude Desktop config:
{
"mcpServers": {
"company-lookup": {
"url": "http://localhost:8080/sse"
}
}
}
The Excel file should have two sheets:
| Company Name | Category | Notes |
|---|---|---|
| Siemens AG | Technology | Major German corporation |
| BMW Group | Automotive | Car manufacturer |
| Company Name | Category | Notes |
|---|---|---|
| Fake Company GmbH | Unknown | Known scam company |
Column names can be customized via configuration.
| Variable | Description | Default |
|---|---|---|
COMPANY_LOOKUP_EXCEL_FILE |
Path to Excel file | - |
COMPANY_LOOKUP_THRESHOLD |
Default fuzzy threshold | 80.0 |
COMPANY_LOOKUP_API_HOST |
API server host | 0.0.0.0 |
COMPANY_LOOKUP_API_PORT |
API server port | 8000 |
MCP_TRANSPORT |
MCP transport type | stdio |
MCP_HOST |
MCP SSE host | 0.0.0.0 |
MCP_PORT |
MCP SSE port | 8080 |
company_lookup/config/settings.yaml:
excel:
file_path: null
whitelist_sheet: "Whitelist"
blacklist_sheet: "Blacklist"
company_name_column: "Company Name"
notes_column: "Notes"
category_column: "Category"
matching:
default_threshold: 80.0
case_sensitive: false
api:
host: "0.0.0.0"
port: 8000
The tool uses an adaptive fuzzy matching algorithm optimized for company names.
| Algorithm | Purpose |
|---|---|
| Simple Ratio | Character-level similarity |
| Partial Ratio | Best substring match |
| Token Sort Ratio | Word order independence |
| Token Set Ratio | Extra word tolerance |
Weights adjust based on query length:
| Query Type | Partial Ratio | Token Matching | Containment |
|---|---|---|---|
| Short (≤4 chars) | 40% | 40% | 10% |
| Single word | 30% | 45% | 10% |
| Multi-word | 20% | 60% | 10% |
If all query tokens appear in the target, score is boosted to minimum 88%.
Example: "BMW" → "BMW Group" gets boosted because "bmw" ⊂ {"bmw", "group"}
| Query | Target | Score | Why |
|---|---|---|---|
| BMW | BMW Group | 88% | Token containment boost |
| Mercedes Benz | Mercedes-Benz Group AG | 93% | Hyphen normalized |
| Seimens | Siemens AG | 77% | Typo tolerance |
| BMW (Automotive) | BMW Group | 88% | Parentheses stripped |
The testing framework has two parallel components:
test_quantification.py) - Tests the fuzzy matching algorithms with edge casesevaluate_mcp.py) - Tests LLM ability to correctly invoke MCP toolscd mcp-tools/company-lookup
pip install -e ".[dev]"
Tests the core fuzzy matching logic with 50+ edge cases:
# Run all quantification tests
pytest tests/test_quantification.py -v
# Run with detailed output
pytest tests/test_quantification.py -v -s
# Run specific test categories
pytest tests/test_quantification.py -v -k "typo"
pytest tests/test_quantification.py -v -k "blacklist"
pytest tests/test_quantification.py -v -k "edge"
# Generate standalone report
python tests/test_quantification.py
Test categories:
Tests how well different LLMs can parse natural language and call MCP tools:
# Test with local Ollama models (default)
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b qwen2.5:7b
# Test single Ollama model
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b
# Test with custom Ollama endpoint
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b --endpoint http://my-server:11434
# Test with Anthropic Claude (requires ANTHROPIC_API_KEY env var)
export ANTHROPIC_API_KEY=sk-ant-...
python tests/mcp_evaluation/evaluate_mcp.py --anthropic-model claude-3-haiku-20240307
# Test with OpenAI-compatible API (LM Studio, vLLM, OpenRouter)
python tests/mcp_evaluation/evaluate_mcp.py \
--openai-endpoint http://localhost:1234 \
--openai-model local-model
# Use custom test prompts
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b -t my_prompts.json
# Specify custom Excel file
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b -e data/companies.xlsx
# Save results to custom directory
python tests/mcp_evaluation/evaluate_mcp.py -m llama3.2:3b -o my_results/
The evaluation produces:
tests/mcp_evaluation/results/Metrics tracked:
| Metric | Description |
|---|---|
| Tool Call % | Percentage of prompts where tool JSON was successfully extracted |
| Company Acc % | Percentage where company name was correctly extracted |
| Status Acc % | Percentage where final status (whitelist/blacklist/unknown) was correct |
| Avg Time | Average response time per prompt |
The test prompts are defined in tests/mcp_evaluation/test_prompts.json:
{
"metadata": {
"version": "2.0",
"total_prompts": 30
},
"test_cases": [
{
"id": "exact_001",
"category": "exact_match",
"prompt": "Check if Siemens AG is approved for internships",
"expected_tool": "lookup_company",
"expected_result": {
"status": "whitelisted",
"is_approved": true
},
"company_name": "Siemens AG",
"difficulty": "easy",
"language": "en",
"tags": ["exact", "whitelist"]
}
]
}
Categories covered:
exact_match - Exact company name queriesfuzzy_match - Typos and partial namesblacklist - Blocked company queriesunknown - Unknown company queriesbatch - Multiple company queriesstats - Database statistics querieslist - List companies queriesedge_case - Complex suffixes, case, whitespacegerman - German language promptsFor complex setups, use a JSON config file:
{
"llms": [
{
"name": "Local-Llama",
"endpoint": "http://localhost:11434",
"model": "llama3.2:3b",
"api_type": "ollama",
"temperature": 0.0
},
{
"name": "Claude-Haiku",
"endpoint": "https://api.anthropic.com",
"model": "claude-3-haiku-20240307",
"api_type": "anthropic",
"api_key": "sk-ant-..."
}
]
}
Run with:
python tests/mcp_evaluation/evaluate_mcp.py -c llm_configs.json
Example output:
╭─────────────────── Results: Ollama-llama3.2:3b ───────────────────╮
│ Metric │ Value │ Percentage │
│ Total Tests │ 30 │ 100% │
│ Tool Called │ 28 │ 93.3% │
│ Correct Tool Name │ 26 │ 86.7% │
│ Company Name Correct │ 25 │ 83.3% │
│ Status Prediction │ 24 │ 80.0% │
│ Avg Response Time │ 1.23s │ - │
╰───────────────────────────────────────────────────────────────────╯
Guidelines:
For n8n workflow integration, use the REST API:
Start the API server:
company-lookup serve -e /path/to/companies.xlsx
# or with Docker:
docker-compose --profile api up
In n8n, use HTTP Request node:
http://localhost:8000/lookup{"company_name": "{{$json.company_name}}"}black company_lookup/
mypy company_lookup/
ruff check company_lookup/
company_lookup/
├── cli.py # Click CLI interface
├── api.py # FastAPI REST endpoints
├── mcp_server.py # MCP server (SSE/stdio)
├── config/
│ ├── manager.py # Configuration management
│ └── settings.yaml # Default settings
├── core/
│ ├── excel_reader.py # Excel file parsing
│ ├── fuzzy_matcher.py # Fuzzy matching engine
│ └── lookup_engine.py # Main lookup logic
├── data/
│ └── schemas.py # Pydantic models
└── output/
├── formatter.py # Console formatting
└── exporter.py # JSON/CSV export
MIT License